Method and system for deep neural networks using dynamically selected feature-relevant points from a point cloud

ABSTRACT

Methods and systems for deep neural networks using dynamically selected feature-relevant points from a point cloud are described. A plurality of multidimensional feature vectors arranged in a point-feature matrix are received. Each row of the point-feature matrix corresponds to a respective one of the multidimensional feature vectors, and each column of the point-feature matrix corresponds to a respective feature. Each multidimensional feature vector represents a respective unordered point from a point cloud and each multidimensional feature vector includes a respective plurality of feature-correlated values, each feature-correlated value represents a correlation extent of the respective feature. A reduced-max matrix having a selected plurality of feature-relevant vectors is generated. The feature-relevant vectors are selected by, for each respective feature, identifying a respective multidimensional feature vector in the point-feature matrix having a maximum feature-correlated value associated with the respective feature. The reduced-max matrix is output to at least one neural network layer.

TECHNICAL FIELD

The present disclosure relates to a deep neural network that includes atleast one layer that is applied to dynamically select feature-relevantpoints from a point cloud.

BACKGROUND

Deep learning, a promising approach to solve challenges in variousfields (e.g., computer vision, speech recognition, etc.), is a machinelearning method based on learning data representations in artificialintelligence (AI) systems.

Deep learning has been useful in providing discriminative information ofordered data, such as two-dimensional (2D) images having ordered pixels,for object classification tasks.

Three-dimensional (3D) sensors (such as red green blue-depth (RGB-D)cameras, LIDAR sensors, etc.) may capture 3D information about asurrounding environment and generate a set of data points in space thatare representative of the captured 3D information. The set of datapoints in three-dimensional space is generally referred to in the art asa point cloud and is provided by the 3D sensors as 3D data.

However, the 3D data typically is in the form of unordered points havingnon-uniform sampling density. The point cloud is therefore a set of datapoints in three-dimensional space with irregular formation, andprocessing of 3D data usually suffers from irregular sampling issues,such as high computation cost and inaccuracy of object classification.

Accordingly, it would be desirable to provide a solution for applyingdeep learning to unordered points with effective point featureextraction and high accuracy classification to achieve state-of-the artperformance on point cloud data.

SUMMARY

The present disclosure provides methods and systems for dynamicallyselecting critical points by including at least one critical point layer(CPL) in a deep neural network. The selected critical points may beapplied in at least one fully-connected layer (FCL) for classification,in which each of the selected critical points is recognized andclassified with respect to boundary and/location of an object inaccordance with different scenarios. In some examples, selecting thecritical points takes into consideration weighting or contribution of adifferent respective critical point, using a weighted CPL (WCPL). Insome examples, the methods and systems of the present disclosure may beused in various different applications, such as control of an autonomousvehicle or any other learning-based processing system. Suchconfiguration of the deep neural network may help to improveclassification accuracy and may help to lower complexity cost tocomplete subsequent object classification tasks in differentapplications.

In some example aspects, the present disclosure describes a method thatincludes: receiving a plurality of multidimensional feature vectorsarranged in a point-feature matrix, each row of the point-feature matrixcorresponding to a respective one of the multidimensional featurevectors, and each column of the point-feature matrix corresponding to arespective feature, each multidimensional feature vector representing arespective unordered data point of a point cloud and eachmultidimensional feature vector including a respective plurality offeature-correlated values, each feature-correlated value representing acorrelation extent of the respective feature; generating a reduced-maxmatrix having a selected plurality of feature-relevant vectors, each rowof the reduced-max matrix corresponding to a respective one of thefeature-relevant vectors and each column of the reduced-max matrixcorresponding to the respective feature; wherein the feature-relevantvectors are selected by, for each respective feature, identifying arespective multidimensional feature vector in the point-feature matrixhaving a maximum feature-correlated value associated with the respectivefeature; and outputting the reduced-max matrix for processing by a finalconvolution layer of a deep neural network.

In any of the preceding aspects/embodiments, the generating may include:generating an index vector containing row indices of the identifiedmultidimensional feature vectors; generating a sampled index vector bysampling the row indices in the index vector to a desired number; andgenerating the reduced-max matrix using the row indices contained in thesampled index vector.

In any of the preceding aspects/embodiments, the sampling isdeterministic, the method may further comprise: prior to the sampling,sorting the row indices contained in the index vector in an ascendingorder.

In any of the preceding aspects/embodiments, the desired number ispredefined for performing batch processing for different respectivepoint clouds.

In any of the preceding aspects/embodiments, at least two identifiedrespective multidimensional feature vectors in the point-feature matrixare identical and correspond to an identical data point, the methodfurther comprises: for the at least two identified respectivemultidimensional feature vectors corresponding to the identical datapoint, the generating further comprises: selecting a respective uniquerow index associated with the at least one identified respectivemultidimensional feature vector; generating a unique index vector thatincludes a plurality of respective unique row indices each correspondingto a different respective point; and generating the reduced-max matrixhaving a selected plurality of feature-relevant vectors based on theunique index vector, wherein the selected feature-relevant vectors aredifferent with respect to each other.

In any of the preceding aspects/embodiments, outputting the reduced-maxmatrix comprises: providing the reduced-max matrix as input to the finalconvolution layer, the final convolution layer performing featureextraction on the feature-relevant vectors to obtain a desired number ofrepresented features in each feature-relevant vector; and providing theoutput of the final convolution layer to an object classificationsubsystem of the deep neural network to classify the selected pluralityof feature-relevant vectors.

In any of the preceding aspects/embodiments, the receiving comprises:receiving a plurality of the unordered data points of the point cloud;

generating a plurality of transformed data by applying preliminaryspatial transformation and filtering to the received unordered datapoints; and providing the plurality of transformed data to aconvolutional layer of feature extraction subsystem of the deep neuralnetwork to generate the plurality of multidimensional feature vectors.

In any of the preceding aspects/embodiments, the plurality of unorderedpoints are captured by LIDAR sensor or red green blue-depth (RGB-D)camera.

In some aspects, the present disclosure describes a method implementedin a deep neural network, the method comprises: receiving a plurality ofunordered data points of a point cloud; encoding the plurality ofunordered data points using a convolutional layer of the deep neuralnetwork to generate a plurality of multidimensional feature vectorsarranged in a point-feature matrix, each row of the point-feature matrixcorresponding to a respective one of the multidimensional featurevectors, and each column of the point-feature matrix corresponding to arespective feature, the respective multidimensional feature vectorrepresents a respective unordered data point from a point cloud andincludes a plurality of feature-correlated values each representingcorrelation extent of the respective feature; providing thepoint-feature matrix to a critical point layer (CPL) to: generate areduced-max matrix having a selected plurality of feature-relevantvectors, each row of the reduced-max matrix corresponding to arespective one of the feature-relevant vectors and each column of thereduced-max matrix corresponding to the respective feature; wherein thefeature-relevant vectors are selected by, for each respective feature,identifying a respective multidimensional feature vector in thepoint-feature matrix having a maximum feature-correlated valueassociated with the respective feature; and output the reduced-maxmatrix to a final convolution layer of the deep neural network; andoutputting a plurality of classified points by applying the reduced-maxmatrix to the at least one neural network layer.

In any of the preceding aspects/embodiments, the CPL may be applied to:generate an index vector containing row indices of the identifiedmultidimensional feature vectors; generate a sampled index vector bysampling the row indices in the index vector to a desired number; andgenerate the reduced-max matrix using the row indices contained in thesampled index vector.

In any of the preceding aspects/embodiments, sampling the row indices inthe index vector to a desired number is deterministic sampling, and theCPL may be further applied to: prior to the sampling, sort the rowindices contained in the index vector in an ascending order.

In any of the preceding aspects/embodiments, the desired number ispredefined for performing batch processing for different respectivepoint clouds

In any of the preceding aspects/embodiments, at least two identifiedrespective multidimensional feature vectors in the point-feature matrixare identical and corresponds to an identical point, and the CPL may befurther applied to: for the at least two identified respectivemultidimensional feature vectors corresponding to the identical point,select a respective unique row index associated with the at least oneidentified respective multidimensional feature vector; set a uniqueindex vector that includes a plurality of respective unique row indiceseach corresponding to a different respective point; and generate thereduced-max matrix having a selected plurality of feature-relevantvectors based on the unique index vector, wherein the selectedfeature-relevant vectors are different with respect to each other.

In some aspects, the present disclosure describes a system configuredto: receive a plurality of multidimensional feature vectors arranged ina point-feature matrix, each row of the point-feature matrixcorresponding to a respective one of the multidimensional featurevectors, and each column of the point-feature matrix corresponding to arespective feature, each multidimensional feature vector representing arespective unordered data point from a point cloud and eachmultidimensional feature vector including a respective plurality offeature-correlated values, each feature-correlated value representing acorrelation extent of the respective feature; generate a reduced-maxmatrix having a selected plurality of feature-relevant vectors, each rowof the reduced-max matrix corresponding to a respective one of thefeature-relevant vectors and each column of the reduced-max matrixcorresponding to the respective feature; wherein the feature-relevantvectors are selected by, for each respective feature, identify arespective multidimensional feature vector in the point-feature matrixhaving a maximum feature-correlated value associated with the respectivefeature; and output the reduced-max matrix for processing by a finalconvolution layer of a deep neural network.

In any of the preceding aspects/embodiments, the system may be furtherconfigured to: generate an index vector containing row indices of theidentified multidimensional feature vectors; generate a sampled indexvector by sampling the row indices in the index vector to a desirednumber; and generate the reduced-max matrix using the row indicescontained in the sampled index vector.

In any of the preceding aspects/embodiments, the sampling isdeterministic, the system may be configured to: prior to the sampling,sort the row indices contained in the index vector in an ascendingorder.

In any of the preceding aspects/embodiments, the desired number ispredefined for performing batch processing for different respectivepoint clouds.

In any of the preceding aspects/embodiments, at least two identifiedrespective multidimensional feature vectors in the point-feature matrixare identical and correspond to an identical point, the system may befurther configured to: for the at least two identified respectivemultidimensional feature vectors corresponding to the identical point,select a respective unique row index associated with the at least oneidentified respective multidimensional feature vector; generate a uniqueindex vector that includes a plurality of respective unique row indiceseach corresponding to a different respective point; and generate thereduced-max matrix having a selected plurality of feature-relevantvectors based on the unique index vector, wherein the selectedfeature-relevant vectors are different with respect to each other.

In any of the preceding aspects/embodiments, the system may be furtherconfigured to: provide the reduced-max matrix as input to the finalconvolution layer of the deep neural network, the final convolutionlayer performing feature extraction on the feature-relevant vectors toobtain a desired number of represented features in each feature-relevantvector; and provide the output of the final convolution layer to anobject classification subsystem of the deep neural network to classifythe selected plurality of feature-relevant vectors.

In any of the preceding aspects/embodiments, the system may be furtherconfigured to: receive a plurality of the unordered data points of thepoint cloud; generate a plurality of transformed data by applyingpreliminary spatial transformation and filtering to the receivedunordered data points; and encode the plurality of transformed data to aconvolutional layer of feature extraction subsystem of the deep neuralnetwork to generate the plurality of multidimensional feature vectors.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating some components of an exampleautonomous vehicle.

FIG. 2A is a schematic diagram of an example deep neural network inaccordance with one example embodiment.

FIG. 2B is block diagram of another example deep neural network inaccordance with a further example embodiment.

FIG. 3A is a flowchart illustrating an example method that may beimplemented by at least one CPL of FIG. 2A.

FIG. 3B is a flowchart illustrating an example method for selecting aplurality of critical points from a plurality of unordered point thatmay be performed by at least one CPL of FIG. 2A.

FIG. 3C is a flowchart illustrating an alternative example method forselecting a plurality of critical points from a plurality of unorderedpoint that may be performed by at least one CPL of FIG. 2A.

FIG. 4 is a pseudo-code representation of example instructions forimplementing the example methods of FIGS. 3B and 3C.

FIG. 5 is a schematic diagram illustrating another example deep neuralnetwork in accordance with a further example embodiment.

Similar reference numerals may have been used in different figures todenote similar components. DESCRIPTION OF EXAMPLE EMBODIMENTS

Some examples of the present disclosure are described in the context ofautonomous vehicles. However, the methods and systems disclosed hereinmay also be suitable for implementation outside of autonomous devices,for example in non-vehicular devices, and non-autonomous devices. Forexample, any system or device that performs object classification orsegmentation on unordered points may benefit from the examples describedhere. Further, examples of the present disclosure may be implemented inimage processing devices including cameras or sensors, such asworkstations or other computing devices not related to autonomousmachines (e.g., image processing workstations for classifying oranalyzing radar data or ultrasound data).

Although examples described herein refer to a motor vehicle as theautonomous vehicle, the teachings of the present disclosure may beimplemented in other forms of autonomous or semi-autonomous vehiclesincluding, for example, trams, subways, trucks, buses, watercraft,aircraft, ships, drones (also called unmanned aerial vehicles (UAVs)),warehouse equipment, construction equipment or farm equipment, and mayinclude vehicles that do not carry passengers as well as vehicles thatdo carry passengers. The methods and systems disclosed herein may alsobe suitable for implementation in non-vehicular devices, for exampleautonomous vacuum cleaners and lawn mowers.

FIG. 1 is a block diagram illustrating certain components of an exampleautonomous vehicle 100. Although described as being autonomous, thevehicle 100 may be operable in a fully-autonomous, semi-autonomous orfully user-controlled mode. In the present disclosure, the vehicle 100is described in the embodiment of a car, however the present disclosuremay be implemented in other vehicular or non-vehicular machines, asdiscussed above.

The vehicle 100 includes a sensor system 110, a data analysis system120, a path planning system 130, a vehicle control system 140 and anelectromechanical system 150, for example. Other systems and componentsmay be included in the vehicle 100 as appropriate. Various systems andcomponents of the vehicle may communicate with each other, for examplethrough wired or wireless communication. For example, the sensor system110 may communicate with the data analysis system 120, path planningsystem 130 and the vehicle control system 140; the data analysis system120 may communicate with the path planning system 130 and the vehiclecontrol system 140; the path planning system 130 may communicate withthe vehicle control system 140; and the vehicle control system 140 maycommunicate with the electromechanical system 150.

The sensor system 110 includes various sensing units for collectinginformation about the internal and/or external environment of thevehicle 100. In the example shown, the sensor system 110 includes aradar 112, a LIDAR sensor 114, a camera 116 and a global positioningsystem (GPS) 118 for collecting information about the externalenvironment of the vehicle. The sensor system 110 may include othersensing units, such as a temperature sensor, precipitation sensor ormicrophone, among other possibilities.

The camera 116 may be a RGB-D camera which captures static image and/ora video comprising a set of images, for example and generate image datarepresentative of the captured static image and/or the images of thevideo. In some examples, each image may include per-pixel depthinformation. The image data captured by the camera 116 may be 3D imagedata, which may be encoded in the form of a 3D point cloud and providedas 3D data.

The LIDAR sensor 114 may capture information in a wide view (e.g., 360°view) about the vehicle 100. The LIDAR sensor 114 may also capture 3Dinformation about the external environment, and the captured informationmay be encoded in the form of a set of data points in the 3D space(e.g., a 3D point cloud) and provided as 3D data, where each data pointin the 3D data represents 3D coordinates (e.g., x, y and z values inmeters) of a sensed object in the 3D space (e.g. the point of originfrom which light is reflected from the object). In some examples, inaddition to 3D coordinates, each data point in the 3D data may alsocontain other information, such as intensity of reflected light or timeof detection.

Regardless of whether the 3D data is output by the camera 116 or by theLIDAR sensor 114 in the form of a point cloud, the data points in thepoint cloud may be irregularly spaced, for example depending on theexternal environment.

Using the various sensing units 112, 114, 116, 118, the sensor system110 may collect information about the local external environment of thevehicle 100 (e.g., any immediately surrounding obstacles) as well asinformation from a wider vicinity (e.g., the radar unit 112 and LIDARsensor 114 may collect information from an area of up to 100 m radius ormore around the vehicle 100). The sensor system 110 may also collectinformation about the position and orientation of the vehicle 100relative to a frame of reference (e.g., using the GPS unit 118). Thesensor system 110 may further collect information about the vehicle 100itself. In such a case, the vehicle 100 may itself be considered part ofthe sensed environment. For example, the sensor system 110 may collectinformation from sensing units (e.g., accelerometers, speedometer,odometer and/or inertial measurement unit), which may or may not be partof the sensor system 110, to determine the state of the vehicle 100,such as linear speed, angular speed, acceleration and tire grip of thevehicle 100.

The sensor system 110 communicates with the data analysis system 120 toprovide sensor data, including 3D data to the data analysis system 120which is configured to learn to detect, identify, and classify objectsin the external environment using the sensor data, for example todetect, identify, and classify a pedestrian or another car. The dataanalysis system 120 may be any suitable learning-based machineperception system that implements machine learning algorithms to learnto detect, identify, classify objects in the external environment, forexample to detect and identify a pedestrian or another car. The dataanalysis system 120 in this example includes a deep neural network forobject classification 200A (FIG. 2) which will be described with greaterdetail below. The data analysis system 120 may be implemented usingsoftware, which may include any number of independent or interconnectedmodules or control blocks, for example to implement any suitable machinelearning algorithms to perform feature extraction using 3D data receivedfrom the LIDAR sensor 114 and/or the camera 116, and objectclassification using the extracted features. The software of the dataanalysis system 120 may be executed using a one or more processing unitsof a vehicle controller (not shown) of the vehicle 100 as describedbelow. The output of the data analysis system 120 may include, forexample data identifying objects, including for each object, an objectclass or label.

Sensor data from the sensor system 110 and the output of the dataanalysis system 120 may be provided to the path planning system 130. Thepath planning system 130 carries out path planning for the vehicle 100.For example, the path planning system 130 may plan a path for thevehicle 100 to travel from a starting point to a target destination,using information from the GPS unit 118. The path planning system 130may be implemented as one or more software modules or control blockscarried out by one or more processing units in the vehicle 100. In someexamples, the path planning system 130 may perform path planning atdifferent levels of detail, for example on a mission planning level, ona behavior planning level, and on a motion planning level. The outputfrom the path planning system 130 may include data defining one or moreplanned paths for the vehicle 100 to travel. The path planning carriedout by the path planning system 130 is performed in real-time or nearreal-time, to enable the vehicle 100 to be responsive to real-timechanges in the sensed environment. Output from the path planning system130 may be provided to the vehicle control system 140.

The vehicle control system 140 serves to control operation of thevehicle 100. The vehicle control system 140 may be used to provide full,partial or assistive control of the vehicle 100. The vehicle controlsystem 140 may serve to fully or partially control operation of theelectromechanical system 150, when the vehicle 100 is operatingautonomously or semi-autonomously, based on the planned path from thepath planning system 130. Data received from the sensor system 110and/or the data analysis system 120 may also be used by the vehiclecontrol system 140. In this example, the vehicle control unit 140includes a steering unit 142, a brake unit 144 and a throttle unit 146.Each of these units 142, 144, 146 may be implemented as separate orintegrated software modules or control blocks within the vehicle controlsystem 140. The units 142, 144, 146 generate control signals to controlthe steering, braking and throttle, respectively, of the vehicle 100.The vehicle control system 140 may include additional components tocontrol other aspects of the vehicle 100 including, for example, controlof turn signals and brake lights.

The electromechanical system 150 receives control signals from thevehicle control system 140 to operate the mechanical components of thevehicle 100. The electromechanical system 150 effects physical operationof the vehicle 100. In the example shown, the electromechanical system150 includes an engine 152, a transmission 154 and wheels 156. Theengine 152 may be a gasoline-powered engine, an electricity-poweredengine, or a gasoline/electricity hybrid engine, for example. Othercomponents may be included in the electromechanical system 150,including, for example, turn signals, brake lights, fans and windows.

The vehicle 100 may include other components that are not shown,including, for example, a user interface system and a wirelesscommunication system (e.g., including an antenna). These othercomponents may also provide input to and/or receive output from theabove-described systems. The vehicle 100 may communicate with anexternal system, for example an external map database. The vehicle 100may also communicate with a network, for example a vehicle network thatenables communication among autonomous, semi-autonomous ornon-autonomous vehicles.

The sensor system 110, data analysis system 120, path planning system130 and the vehicle control system 140 may be individually or incombination be realized, at least in part, in one or more processingunits of a vehicle controller (not shown) of the vehicle 100. The one ormore processing units may be central processing units, graphicprocessing units, tensor processing units, and any combination thereof.For example, the vehicle controller (not shown) of the vehicle 100 mayinclude a processing unit having one or more physical processors (e.g.,a microprocessor, microcontroller, digital signal processor, fieldprogrammable gate array, or application specific integrated circuit)coupled to one or more tangible memories (not shown). A processing unitmay also be referred to as a computing unit or a controller. Thememory(ies) may store instructions, data and/or software modules forexecution by the processing unit(s) to carry out the functions of thesystems described herein. The memory(ies) may store other softwareinstructions and data for implementing other operations of the vehicle100. Each memory may include any suitable volatile and/or non-volatilestorage and retrieval device(s). Any suitable type of memory may beused, such as random access memory (RAM), read only memory (ROM), harddisk, optical disc, subscriber identity module (SIM) card, memory stick,secure digital (SD) memory card, and the like.

As discussed above, 3D data output by the RGB-D camera 116 or the LIDARsensor 114 in the form of a point cloud is irregular and unordered. Thatis, the set of data points forming the point cloud are irregularlyspaced and may be indexed in an arbitrary order. Various learning-basedmachine perception systems that implement conventional machine learningalgorithms have been developed and used for classifying objects inunordered point clouds. However, when there is a large number ofunordered data points in the unordered point cloud, this may posechallenges on object classification, such as increased computation cost,and possible inaccuracy of classification in the learning-based machineperception systems. Moreover, point cloud data (e.g., 3D data) mayinclude redundant data, which may cause processing of the 3D data forobject classification to be slow and may introduce significant delayswhich may be undesirable in various scenarios, such as autonomousdriving.

A deep neural network, as disclosed herein, may help to improve objectclassification accuracy and reduce computational cost, and may addressat least some drawbacks of the above-discussed conventional deep machinelearning algorithms. The disclosed deep neural network may be used invarious applications, including autonomous driving, robotics, or dronenavigation in artificial intelligence (AI) systems.

FIG. 2A is a schematic diagram of an example deep neural network 200A,which may be used in the data analysis system 120 for objectclassification, in accordance with an example embodiment. The deepneural network 200A has a hierarchical structure, which includes atleast feature extraction subsystem 201, a convolution layer 202, and anobject classification subsystem 203. For ease of illustration, only onefeature exaction subsystem 201, and one object classification subsystem203 are discussed in the example of FIG. 2A. The feature extractionsubsystem includes a convolution layer 205 for performing a convolutionoperation, such as edge convolution, and a critical point layer (CPL)207. The feature extraction subsystem 201 may be implemented using oneor more suitable neural networks (e.g., a CNN) to produce featurevectors for a respective unordered data point. Another deep neuralnetwork 200B, having different hierarchical structure with a pluralityof feature extraction subsystems 201, and an object classificationsubsystem 203, will be described further below with reference to FIG.2B.

As shown in FIG. 2A, 3D data in the form of a point cloud is received asinput to the feature extraction subsystem 201 of the deep neural network200A. The point cloud includes a plurality of unordered data points.With respect to one point cloud, a plurality of labeled critical datapoints are output from the deep neural network 200A. The plurality oflabeled critical data points are all assigned one object class label.The other unordered data points which are not selected as critical datapoints may be discarded to save computational cost. For example, in thecontext of an autonomous vehicle, some critical points from a pointcloud may be labeled as representing a pedestrian. For ease ofillustration, processing the plurality of unordered data points of onepoint cloud is discussed herein and below. In some examples, theplurality of unordered data points may be sensed 3D data from the sensorsystem 110 (e.g., from the LIDAR sensor 114 or camera 116) shown inFIG. 1. Each unordered data point may be represented by x, y and zvalues of a 3D coordinate. That is, each unordered data point may berepresented as a 3D vector including respective values in x, y and zfeatures. The plurality of unordered data points (which may be in theform of 3D vectors) is provided to the convolution layer 205 to generatea plurality of multidimensional feature vectors arranged in apoint-feature matrix 211. Each row of the point-feature matrix 211corresponds to a respective feature vector, and each feature vectorrepresents a respective unordered data point from the point cloud. Afeature vector includes feature-correlated values, representing aplurality of features, which are output by the convolution layer 205.Each feature-correlated value contained in the feature vector may beconsidered a dimension of the feature vector, hence the feature vectormay be considered a multidimensional feature vector. Each column of thepoint-feature matrix 211 corresponds to a respective one of theplurality of features. Thus, a multidimensional feature vectorrepresents a given unordered point, and the multidimensional featurevector includes a plurality of feature-correlated values, where eachfeature-correlated value represents the extent to which the givenunordered point correlates to the respective feature. The plurality offeature-correlated values and the correlation extent will be discussedfurther below.

The convolution layer 205 may be referred to as a “filtering stage”,because the convolution layer 205 may use a filter or a kernel togenerate the feature vectors. The number of features that are identifiedby the convolution layer 205 may be associated with a filtering factor.In this example, the number of features for each multidimensionalfeature vector is larger than 3. In some implementations, the pluralityof features associated with a respective multidimensional feature vectormay be different from the original 3 features (e.g., the x, y and zcoordinate values) associated with the original 3D vector representingan unordered data point. In addition, the convolution layer 205 may helpto increase the number of features correlated to a respective unordereddata point. In some examples, different data points in the point cloudmay be originally associated with different numbers of features (e.g.,having fewer or greater than three features). Regardless of the numberof features associated with different unordered data points inputted tothe convolution layer 205, feature vectors outputted from theconvolution layer 205 may have same number of features. For example, theconvolution layer 205 may be designed to output feature vectorsrepresenting a predefined set of features (e.g., predefined according todesired object classes). The convolution layer 205 may thus outputfeature vectors representing the same features, to represent differentunordered points. This may enable implementation of subsequent objectclassification steps to be performed more readily.

For ease of illustration, the example discussed herein refers to thepoint-feature matrix 211 being outputted from the convolution layer 205,to a CPL for down-sampling data points from the point cloud bygenerating a reduced-max matrix 213. This is only illustrative and isnot intended to be limiting. In other examples, the point-feature matrix211 may have different configurations. For example, the point-featurematrix 211 may represent a different number of points than the exampleshown in FIG. 2A, based on the number of unordered points received asinput. The point-feature matrix 211 may also represent a differentnumber of features than the example shown in FIG. 2A, in accordance withthe filtering factor of the convolution layer 205. In the example shownin FIG. 2A, the point-feature matrix 211 includes 8 multidimensionalfeature vectors (having row indices 0-7), and each multidimensionalfeature vector has 14 features. Thus, the point-feature matrix 211 is a8×14 matrix having 8 rows corresponding to 8 unordered points of a pointcloud, and 14 columns corresponding to 14 features.

The CPL 207 is provided the point-feature matrix 209 output by theconvolution layer 205. The CPL 207 uses the point-feature matrix 211 togenerate a reduced-max matrix 213. Each row of the reduced-max matrix213 represents a respective feature-relevant vector. A feature-relevantvector is a multidimensional feature vector that is selected from thepoint-feature matrix 211, as discussed further below. In this example,each feature-relevant vector represents a critical data point in thepoint cloud. In the present disclosure the term “critical” refers to adata point that is selected to be represented in the reduced-max matrix213 because of its feature-relevance. A critical data point has moreimportance for the purpose of object classification than other datapoints that were not selected to be represented in the reduced-maxmatrix 213. For ease of illustration and understanding, the reduced-maxmatrix 213 output from the CPL 207 is discussed herein and also used asan example to demonstrate configuration of each critical data point inthe further example below. This is only illustrative and is not intendedto be limiting. In other examples, the dimensions of the reduced-maxmatrix 213 may be different, for example based on different methods toselect critical data points. The identification and selection of thecritical data points from the point-feature matrix 211 to produce thereduced-max matrix 213 will be discussed in greater detail below.

Each column of the reduced-max matrix 213 corresponds to a respectivefeature. It should be noted that the number of columns of thepoint-feature matrix 211 and the number of columns of the reduced-maxmatrix 213 are equal, which means that a multidimensional feature vectorand a feature-relevant vector represent the same features. The number ofrows of the reduced-max matrix 213 is less than the number of rows ofthe point-feature matrix 211. The CPL 207 generates the reduced-maxmatrix 213 by, for each feature, selecting a respective multidimensionalfeature vector in the point-feature matrix 211 having a maximumfeature-correlated value associated with that feature. In particular,for each feature (i.e., each column of the point-feature matrix 211),the CPL 207 identifies the multidimensional feature vector having thehighest value contained in that column, and selects that identifiedmultidimensional feature vector to be a feature-relevant vector to beincluded in the reduced-max matrix 211.

Thus, the reduced-max matrix 213 only includes multidimensional featurevectors from the point-feature matrix 211 that have at least one maximumfeature-correlated value associated with different features. Such amethod for selecting critical data points, as represented byfeature-relevant vectors, using at least one CPL 207, may help toimprove classification accuracy by the object classification subsystem203 of the deep neural network 200A. This may also help to reducecomplexity of computation as the number of critical points issignificantly less than the total number of captured points in the pointcloud.

The convolution layer 202 receives the reduced-max matrix 213 andperforms a convolution operation of the reduced-max matrix 213. Theoutput of the convolution layer 202 is a set of critical data points.The set of critical data points may be represented in the form offeature vectors arranged in a reduced-max matrix with further features.The feature vectors are then provided to the object classificationsubsystem 203 which performs object classification to assign thecritical points in the point cloud an object class label. The output ofthe object classification subsystem 203 includes a plurality of labeledcritical data points that are associated with and correspond to anobject. The object classification subsystem 203 can be implemented usingany type of neural network, such as a support vector machine (SVM).

As noted above, the convolution layer 205 is configured to increasefeatures for a respective unordered data point in the point cloud andthe CPL 207 is configured to select critical data points. In someimplementations, the convolution layer 205 with a filtering factor i mayincrease the number of features of a respective unordered point to i,and the CPL 207 with a down-sampling factor j may reduce the number ofcritical data points to one j^(th) of the number of unordered datapoints.

In some alternative examples, feature extraction subsystem 201,including the convolution layer 205 and the CPL 207, may be applied asmany times as desired to constitute a deep neural network having adifferent respective hierarchical structure. The deep neural network maybe designed to achieve a desired number of critical data points and adesired number of features, for example in order to attain a desiredaccuracy in object classification.

FIG. 2B shows an example deep neural network 200B having a plurality offeature extraction subsystems 201(1) to 201(3) (generally referred to asfeature traction subsystem 201) in accordance with a further exampleembodiment. The deep neural network 200B may be designed with aplurality of feature extraction subsystems 201 in order to output adesired number of critical data points each with a desired number offeatures. Each feature extraction subsystem 201 can be configured tooutput a predefined number of critical data points and a predefinednumber of features for each critical point.

For example, N unordered data points of a point cloud are received asinput by the first feature extraction subsystem 201(1). In someexamples, an arbitrary number of features are associated with each oneof the N unordered data points. The first feature extraction subsystem201(1) includes a first convolution layer 205(1) with a first filteringfactor F_(k1) and a first CPL 207(1) with a first down-sampling factork1. Thus, the output of the first feature extraction subsystem 211(1) isN/k1 critical data points each having F_(k1) features. The number ofcritical data points may be further reduced and the number of featuresmay be further increased by one or more subsequent feature extractionsubsystems 201. The number of the extraction subsystems 201 used toprocess the unordered data points is determined based on the desirednumber of output points of the deep neural network 200B, and the desirednumber of featured associated with the output of the deep neural network200B. In the example shown in FIG. 2B, the deep neural network 200Bincludes three feature extraction subsystems 201(1), 201(2), and 201(3)and a final layer (e.g., convolution layer 202). Generally, each featureextraction subsystem 201(i) includes a convolution layer 205(i) with afiltering factor F_(ki) and a CPL 207(i) with a down-sampling factor ki,where i is an integer. In this example, the three feature extractionsubsystems 201 are used to output N/(k1×K2×k3) critical data points,each critical data point having F_(k3) features as a reduced-maxmatrix213(i). The convolution layer 202, having a filtering factorF_(k4), acts as a final convolutional operation to be applied, thus theoutput of the convolution layer 202 is N/(k1×K2×k3) critical data pointswith each data point having F_(k4) features. The output of theconvolution layer 202 is reduced-max matrix 213(3) with furtherfeatures, which is provided to the object classification subsystem 203to perform object classification on the critical data points, asdiscussed above.

It is noted that the number of convolution layers may be different thanthe number of the CPLs and the convolution layers may be used separatelywith the CPLs, in order to facilitate an output of a deep neural networkto achieve a desired number of features (by using different convolutionlayers), and attain a desired number of critical data points (by usingdifferent of CPLs). The desired number of features and the desirednumber of critical points in the output may be predefined. The exampleof FIG. 2B is only used for illustration and is not intended to belimiting. In other examples, different configurations includingdifferent numbers of convolution layers 205 and CPLs 207 may be appliedin different configurations of the deep neural network.

FIG. 3A shows an example method 300 for applying the feature extractionsubsystem 201 to a plurality of multidimensional feature vectors tooutput a plurality of critical data points. The method 300 may beperformed by the feature extraction subsystem 201 of the deep neuralnetwork 200A, 200B.

At action 301, a plurality of multidimensional feature vectors arrangedin a point-feature matrix 211 are received. Each row of thepoint-feature matrix corresponds to a respective one of themultidimensional feature vector, and each column of the point-featurematrix corresponds to a respective feature.

As shown in FIG. 3, the point-feature matrix 211 includes 8multidimensional feature vectors with 0-7 row indices and eachmultidimensional feature vector has 14 features. That means eachmultidimensional feature vector is a 14-dimensional (14D) featurevector. In some examples, the feature extraction subsystem 201 may applyencode unordered data points (e.g., 3D data in the form of a point cloudcaptured by the LIDAR sensor 114 or RGB-D camera 116) into the 8 14Dfeature vectors. Thus, each multidimensional feature vector represents arespective unordered data point from the point cloud.

With respect to point-feature matrix 211, for a given feature (i.e., agiven column), the correlation extent is used to evaluate whichmultidimensional feature vector makes greater contribution with respectto this feature and is more important than other multidimensionalfeature vectors for this feature. The feature-correlated value of amultidimensional feature vector for a given feature may be directlyproportional to the correlation extent of the multidimensional featurevector for that given feature. For example, the larger thefeature-correlated value of a multidimensional feature vector, the moreimportant the unordered data point corresponding to the respectivemultidimensional feature vector, for that given feature.

At action 302, a reduced-max matrix 213 having a selected plurality offeature-relevant vectors is generated. Each row of the reduced-maxmatrix 213 corresponds to a respective feature-relevant vector, and eachcolumn of the reduced-max matrix 213 corresponds to a respectivefeature. For each feature, a respective multidimensional feature vectorin the point-feature matrix 211 having a maximum feature-correlatedvalue associated with the respective feature is identified to select thefeature relevant vectors.

For example, consider a first column associated with a first feature inthe point-feature matrix 211 shown in FIG. 3A. In the first column, afeature-correlated value in row index 4 has a maximum value (marked as adark entry in the point-feature matrix 211) among all rows of thepoint-feature matrix 211. That indicates that the point corresponding tothe multidimensional feature vector at row index 4 has the mostcontribution to the first feature. This is only illustrative and is notintended to be limiting. In other examples, the correlation extent maybe inversely proportional to the feature-correlated values.

It should be noted that a single multidimensional feature vector maycontain more than one maximum feature-correlated value for differentfeatures. For example, in FIG. 3A, the multidimensional feature vectorat row index 4 contains maximum feature-correlated values for the firstcolumn as well as for the second column. It should also be noted thatone or more multidimensional feature vectors of the point-feature matrix211 may contain zero maximum feature-correlated values. For example, themultidimensional feature vector at row index 2 does not contain anymaximum feature-correlated values. In the event that two or moremultidimensional feature vectors contain equally high maximumfeature-correlated values for a given feature, all such multidimensionalfeature vectors may be selected as feature-relevant vectors for thatfeature; alternatively, one of the two or more multidimensional featurevectors may be randomly selected as the feature-relevant vector for thatfeature.

As shown in FIG. 3A, the reduced-max matrix 213 includes a selectedplurality of feature-relevant vectors. As described above, eachfeature-relevant vector represents a respective critical data point. TheCPL 207 is applied to output the reduced-max matrix 213 in which thenumber of critical data points is less the number of multidimensionalfeature vectors of the point-feature matrix 211, and the number of arespective critical data point's features is identical to the number offeatures in the respective multidimensional feature vector. The CPL 207may thus help to reduce the number of data points required to beprocessed by subsequent convolution layers of the deep neural network,which may help to save computation cost.

At action 303, the reduced-max matrix 213 is outputted to at least finalconvolution layer 202. The convolution layer 202 performs a finalfeature extraction (e.g., applying a final filtering factor as discussedabove), which may serve to achieve a constant number of features for allthe critical data points, and outputs the reduced-max matrix 213 withfurther features. In some examples, the deep neural network includes anobject classification subsystem 203 as shown in FIG. 2A comprising oneor more fully connected layers (not shown). The reduced-max matrix 213with further features, as outputted from the convolution layer 202, maybe provided as input to the object classification subsystem 203. Theobject classification subsystem 203 performs object classification onthe output of the final convolution layer 202 to assign the plurality ofcritical data points from the point cloud an object class label.

As the reduce-max matrix 213 includes a selected plurality offeature-relevant vectors each having at least one maximumfeature-correlated value associated with different respective features,the number of critical points output from the CPL may be reducedsignificantly, compared to the number of unordered data points capturedfrom sensors. Moreover, critical data points are identified and selectedbased the significance and contribution of each data point to differentrespective features. Thus, critical data points may be selecteddynamically, as opposed to using conventional static down-sampling toselect a fixed subset of data points from the point cloud withoutconsideration of feature-relevance of each data point. Thus, thevariable number of critical data points output from the CPL may help toreduce complexity of subsequent object classification dynamically andimprove accuracy of the subsequent object classifications greatly.

FIG. 3B illustrates an example method 300B with respect to how togenerate a reduced-max matrix by applying a CPL 207 in greater detail.FIG. 4 shows a pseudo-code representation of example instructions of analgorithm 400 for implementing the example method 300B of FIG. 3B.Functions of the algorithm 400 may be applied to implement stepsdescribed below. After the point-feature matrix 211 is received ataction 301, the method 300B further comprises:

At action 3021, an index vector 304 containing row indices of identifiedmultidimensional feature vectors is generated. An example approach foridentifying the row indices in the point-feature matrix 211, forgenerating the index vector 304 is now discussed. For each respectivefeature, the feature-correlated values associated with the respectivefeature are evaluated for all multidimensional feature vectors. The rowindex corresponding to the multidimensional feature vector having themaximum feature-correlated value is identified for each respectivefeature. The identified row indices are then stored in the index vector304. In some examples, identification of the maximum feature-correlatedvalue for a respective feature may be implemented by an argmax( )function 401 as shown in line 4 of FIG. 4.

For example, in the point-feature matrix 211 shown in FIG. 3B, differentfeature-correlated values represent different correlation extents for afirst feature represented in the first column. The multidimensionalfeature vector on row index 4 (corresponding to the fifth row in thepoint-feature matrix 204) is identified to have a maximumfeature-correlated value associated with the first feature (marked as adark entry in the point-feature matrix 204). Therefore, the row index 4is entered as the first element in the index vector 304. Indices aresimilarly identified and entered for all other features, to constitutethe index vector 304.

At action 302, a unique index vector is generated. The unique indexvector 305 is generated from the index vector 304 by removingrepetitions in the index entries. For example, in FIG. 3B, the generatedindex vector 304 shows row index 4 is entered 3 times—corresponding tothe first, second, and eighth features. This is because themultidimensional feature vector at row index 4 has maximumfeature-correlated values associated with the first, second, and eighthfeatures. The row index 4 is entered only once in the unique indexvector 305. Thus, in the example shown in FIG. 3B, the unique indexvector 305 includes a plurality of unique row indices (e.g., 4, 0, 5, 3,6, 1 . . . ) each corresponding to a different respectivemultidimensional feature vector. In some examples, generating the uniqueindex vector 305 may be implemented by an unique( ) function 402presented in line 7 of FIG. 4.

At action 3023, row indices contained in the unique index vector 305 aresorted in an ascending order (or descending order) and a sorted indexvector 306 is generated. As shown in FIG. 3B, the row indices in theunique index vector 305 and in the sorted vector 306 are same, except ina sorted order. Sorting the row indices in the sorted index vector 306may be beneficial for performing a subsequent sampling action, such asfor performing deterministic sampling. In some implementations, theaction 3023 may be performed by using a sort ( ) function 403 shown inline 6 of FIG. 4.

At action 3024, a sampled index vector 307 is generated by sampling rowindices contained in the sorted index vector 306 to obtain a desirednumber of entries. This may be useful for batch processing, to ensurethat the number of critical data points that are processed in subsequentlayers is kept constant. The sampling can be up-sampling to a desirednumber or down-sampling to a desired number. The desired number may bedetermined based on a parameter criterion of the algorithm used for thebatch processing, for example. The sampling may be performed usingdeterministic sampling or stochastic sampling. The deterministicsampling may include a nearest neighbor resiting sampling. Thestochastic sampling may include random uniform sampling by using arandom integer generator. In this example, up-sampling is used toprocess the sorted index vector 306 and generate the sampled indexvector 307 having the desired number (e.g., 12) of row indices. Theup-sampling may be implemented by using a rand ( ) function 404 as shownin line 8 of FIG. 4. The result of the sampling at action 3024 is thatthe number of critical data points outputted is the same, for differentpoint clouds that might have different numbers of data points.

At action 3025, the reduced-max matrix 213 is generated using the rowindices contained in the sampled index vector 307. In this example, 12row indices in the sampled index vector 307 are used to gather thecorresponding rows from the point-feature matrix. The gathered rows arethe feature-relevant vectors contained in the reduced-max matrix 213. Itshould be noted that because the sampled index vector 307 may containrepeated row indices (e.g., due to up-sampling performed at action3024), the feature-relevant vectors contained in the reduced-max matrix213 may contain repetitions.

Then at action 303, the reduced-max matrix 213 is output to the finalconvolution layer 202 of the deep neural network 200A, 200B, forperforming a final feature extraction. Output from the final convolutionlayer 202 can then be provided to an object classification subsystem 203for performing object classification and labeling.

Another example method 300C for outputting a reduced-max matrix 213 isnow illustrated in greater detail with reference to FIG. 3C. The examplemethod 300C of FIG. 3C may be implemented using the pseudo code shown inFIG. 4, with removal of the function unique( ) 402. This example issimilar to the example of FIG. 3B, however the action 3022 is skipped totake into account weighting of the multidimensional feature vectors. Inthis example, the CPL 207 may be referred to as a weighted criticalpoint layer (WCPL) to implement the method 300C.

As shown in FIG. 3C, the row indices identified in the index vector 304are sorted and sampled by performing the actions 3023 and 3024 withoutgenerating the unique index vector. A weighted sorted index vector 308is output by implementing the action 3023 on the index vector 304. Theweighted sorted index vector 308 includes all the row indices identifiedat action 3021, including repetitions. For example, as shown in a dashedcircle 310, the repetition number of row index 4 is 3 and the repetitionnumber of row index 5 is 2. This shows the relative feature-relevance ofrow index 4, compared to row index 5.

The weighted sorted index vector 308 is then provided as input toperform the action 3024. A weighted sampled index vector 309 is outputby performing the action 3024. In this example, the sampling performedat action 3024 is down-sampling. As presented in dashed circle 311, therow index 4 is selected 3 times and only one row index 5 is selected toconstitute the weighted sampled index vector 309. With reference to thedashed circle 310 of the weighted sorted index vector 308 and the dashedcircle 311 of the weighted sampled index vector 309, it is noted that,in the weighted sorted index vector 308, the repetition number of afirst row index (e.g., 4) is higher than the repetition number of asecond row index (e.g., 5). Thus, when performing the sampling at action3024, the first row index repeated with higher frequency is more likelyto be selected in the sampling action 3024 than the second row indexrepeated with lower frequency. The weighted sampled index vector 309 isthus more likely to contain a greater number of entries corresponding tothe row index having greater weight (and greater feature-relevance).

By comparing the sampled index vector 307 of FIG. 3B and the weightedsampled index vector 309 of FIG. 3C, it can be seen that the weightedsampled index vector 309 includes row index 4 three times and row index5 only once, whereas the sampled index vector 307 includes row index 4two times and row index 5 also appears two times. Accordingly, it can beappreciated that a row index corresponding to a first multidimensionalfeature vector is likely to be repeated more in the weighted sortedindex vector 308 if the first vector has more maximum feature-correlatedvalues associated with different features, when a WCPL is applied, asdisclosed in this example. By taking contribution or weighting of arespective multidimensional feature vector into consideration, thosemultidimensional feature vectors having more feature-relevantcontributions or larger weight may be selected in the sampling processwith greater probabilities. This may help to ensure critical pointshaving more importance to be selected for further subsequent processingwith greater accuracy.

Regarding the method 300B of FIG. 3B and the method 300C presented inFIG. 3C, it is noted that the actions 3022, 3023, 3024 may each beoptional, and each optional step 3022, 3023, 3024 may be performed oromitted independently of each other action. For example, one or more ofthe actions 3022, 3023, 3024 may be omitted based on desired generationrequirements of the critical data points and the reduced-max matrix 213.

In some examples, the action 3022 may be skipped. That is, thecorresponding function unique( ) 402 may be removed from the algorithm400. The result is that the row indices are weighted to represent therelative importance of the multidimensional feature vectors. This, forexample, has been discussed with reference to FIG. 3C above.

In some examples, the action 3023 may be omitted. For example, if thesampling of the action 3024 is a stochastic sampling, the action 3023 ofsorting may be omitted. If the sampling of the action 3024 is adeterministic sampling, the action 3023 may be performed to enableoutput of the CPL to be more accurate. If the action 3023 is omitted,the function sort( ) 402 may be removed from the algorithm 400accordingly. When action 3024 is omitted, the row indices may be remainin the original entry order of the index vector 304. As a result, theorder of feature-relevant vectors arranged the reduced-max matrix 213may correspond to the original entry order of the indices in the indexvector 304.

In some examples, if the desired number of critical data points has notbeen predefined or if it is not necessary to ensure the same number ofcritical data points is outputted for different point clouds, the action3024 may be omitted. Thus, the function rand( ) 404 shown in FIG. 4 maybe removed accordingly. Thus, the number of critical data points in thereduced-max matrix 213 may equate to the number of identifiedmultidimensional feature vectors in the action 3021.

In some applications, the optional actions 3022, 3023, and 3024 may beperformed collectively to generate the reduced-max matrix 213, asdiscussed in the example method 300B illustrated of FIG. 3B. In someother applications, the optional actions 3022, 3023, and 3024 may beperformed selectively to generate different reduce-max matrices 213. Insome other implementations, any combination of performing or omittingthe optional actions 3022, 3023, and 3024 may be used to generatedifferent reduce-max matrices 213. Thus, the generated reduced-maxmatrix 213 may reflect functional features of the respective functions402-404, which correspond to the optional actions 3022-3024respectively. When any of the alternative actions 3022-3024 is omitted,the reduced-max matrix 213 may be generated without the functionalfeatures corresponding to that step.

For example, if only the action 3022 is performed and the actions 3023and 3024 are omitted, the generated reduced-max matrix 213 will includea plurality of feature-relevant vectors which are unique. That is,critical data points corresponding to the generated reduce-max matrix213 may have no repetitions.

If only the action 3023 is performed and the actions 3022 and 3024 areomitted, the feature-relevant vectors in the reduced-max matrix 213 willbe arranged according to the sorted index order.

If only the action 3024 is performed and the actions 3022 and 3023 areomitted, the number of feature-relevant vectors in the reduce-max matrix213 will equal a desired number.

In some other implementations, other combinations of the optionalactions 3022, 3023, 3024 may be used, as would be appreciated by oneskilled in the art. Further, in some examples, the optional actions3022, 3023, 3024 may be performed in any suitable order. By identifyingthose vectors having maximum correlation with certain features (e.g.,using a CPL 207), a plurality of feature-relevant vectors correspondingto critical data points are outputted in the form of a reduced-maxmatrix 213. This reduces the number of data points to be processed insubsequent convolution layers of the deep neural network. This may helpto reduce computation cost for subsequent classification operationsperformed by the object classification subsystem 203, and may help toimprove the subsequent classification operations.

FIG. 5 illustrates an example deep neural network 500 in which a CPL (orWCPL) is applied as discussed above. The CPL may include a CPL appliedin the example shown in FIG. 3B or a WCPL applied in the example shownin FIG. 3C. For simplicity, FIG. 5 shows a CPL. The deep neural network500 includes feature extraction subsystem 201 comprising atransformation layer 501, a convolution layer 205, a multilayerperceptron (MLP) layer 503, a CPL 207. The deep neural network 500 alsoincludes a final convolution layer 202 and an object classificationsubsystem 203 comprising three fully connected layers 505(1), 505(2),505(3). The deep neural network 500 is fed with a plurality of unordereddata points of a point cloud. In this example, each unordered data pointis represented by a 3D vector and the number of the unordered datapoints is n. In some other examples, each unordered point may berepresented by a different respective vector having an arbitrary numberof features. The transformation layer 501 receives the plurality of 3Dvectors, performs preliminary spatial transformation and filtering, andgenerates transformed data. In this example, the preliminary spatialtransformation and filtering may be performed to help make thetransformed data robust against rotation in different directions. Inother examples, transformation may be an optional step and thetransformation layer 501 may be omitted from the deep neural network500. The convolution layer 205 receives the transformed data to performencoding and filtering, and produces a plurality of multidimensionalfeature vectors. In this example, each multidimensional feature vectorincludes 128 features. Thus, the output of the convolution layer 205 isa matrix having dimensions n×128. The MLP layer 503 may further increasethe number of features for each vector, for example into 1024 features.A matrix of size n×1024 may thus be produced from the MLP layer 503.

The CPL 207 is then applied on the n×1024 matrix to generate a reducednumber of critical data points. For example, the CPL 207 may have adown-sampling factor of 4. Thus, output of the CPL 207 is a matrix ofsize (n/4)×1024. The number of critical data points is reduced to n/4,and each critical data point includes at least one maximumfeature-related value, which may lower computational complexity ofsubsequent convolution layers without significant loss of accuracy. TheCPL 207 disclosed herein can be applied by implementing any of themethods discussed herein, to generate the plurality of critical datapoints arranged in the reduced-max matrix 213. The reduced-max matrixmay contain n/4 rows (corresponding to the number of critical points)and 1024 columns (corresponding to the number of features). As mentionedabove, the CPL 207 be a CPL (e.g., in accordance with the method 302A)or a WCPL (e.g., in accordance with the method 302B).

The reduced-max matrix 213 is provided to the final convolution layer202. The final convolution layer 202 further processes the n/4 criticaldata points to keep the number of features for a respective criticaldata point and the number of the critical data points constant. Then theoutput of the final convolution layer 202 is provided to the objectclassification subsystem 203 to downsize and label the critical datapoints. In this example, the object classification subsystem 203includes 3 FCLs with different downsizing factors. The first FCL 505(1)has a downsizing factor 512, the second 505(2) has a downsizing factor256, and the third FCL 505 (3) has a downsizing factor 40. The three FCL505(1) to 505 (3) (being generically referred to as FCL 505) arepredefined for a desired number of classification labels. In thisexample, the three FCLs 505 are utilized to classify the output from theconvolution layer 202 into 40 label classes. This is only an example; inother implementations, the number of FCLs and different respectivedownsizing factors may be different, for example to form differenthierarchical structures of the deep neural network, in accordance withdifferent scenarios, such as for identifying pedestrians and/or othercars in autonomous driving.

The present disclosure provides examples in which a plurality ofcritical data points may be selected dynamically from a plurality ofreceived unordered points in a point cloud. The disclosed methods may beparticularly advantageous in performing classification on unordered datapoints of a point cloud, using a deep neural network in a complexenvironment, such as autonomous driving.

The present disclosure further illustrates example hierarchicalstructures in a deep neural network in which at least one convolutionlayer encodes and filters a plurality of received unordered points withdifferent respective arbitrary numbers of features and outputs afeature-point matrix, and at least one CPL selects critical data pointsfrom the feature-point matrix. Such a hierarchical structure may help toimprove efficiency for classifying the critical data point and may helpto boost the classification accuracy of the deep neural network.

In some examples, only unique row indices are used for selectingfeature-relevant vectors, and the critical data points in outputreduced-max matrix are different with respect to each other. Such amethod may enable redundant points to be removed, which may further helpto downsize the number of critical data points to be processed,resulting in further reducing complexity of the deep neural network.

In some examples, weighting of the row indices may be used. By takingweight or contribution of a different respective feature vectors intoconsideration, a data point showing greater importance is selected forobject classification, which may help to improve classificationaccuracy.

In some applications, unordered data points of a point cloud may becaptured by LIDAR sensors and/or RGB-D cameras situated on a vehicle inan autonomous driving scenario or on a robot in a completing a specifictask.

Although the present disclosure describes methods and processes withaction in a certain order, one or more actions of the methods andprocesses may be omitted or altered as appropriate. One or more actionsmay take place in an order other than that in which they are described,as appropriate.

Although the present disclosure is described, at least in part, in termsof methods, a person of ordinary skill in the art will understand thatthe present disclosure is also directed to the various components forperforming at least some of the aspects and features of the describedmethods, be it by way of hardware components, software or anycombination of the two. Accordingly, the technical solution of thepresent disclosure may be embodied in the form of a software product. Asuitable software product may be stored in a pre-recorded storage deviceor other similar non-volatile or non-transitory computer readablemedium, including DVDs, CD-ROMs, USB flash disk, a removable hard disk,or other storage media, for example. The software product includesinstructions tangibly stored thereon that enable a processing device(e.g., a personal computer, a server, or a network device) to executeexamples of the methods disclosed herein.

The present disclosure may be embodied in other specific forms withoutdeparting from the subject matter of the claims. The described exampleembodiments are to be considered in all respects as being onlyillustrative and not restrictive. Selected features from one or more ofthe above-described embodiments may be combined to create alternativeembodiments not explicitly described, features suitable for suchcombinations being understood within the scope of this disclosure.

All values and sub-ranges within disclosed ranges are also disclosed.Also, although the systems, devices and processes disclosed and shownherein may comprise a specific number of elements/components, thesystems, devices and assemblies could be modified to include additionalor fewer of such elements/components. For example, although any of theelements/components disclosed may be referenced as being singular, theembodiments disclosed herein could be modified to include a plurality ofsuch elements/components. The subject matter described herein intends tocover and embrace all suitable changes in technology.

1. A method, comprising: receiving a plurality of multidimensionalfeature vectors arranged in a point-feature matrix, each row of thepoint-feature matrix corresponding to a respective one of themultidimensional feature vectors, and each column of the point-featurematrix corresponding to a respective feature, each multidimensionalfeature vector representing a respective unordered data point of a pointcloud and each multidimensional feature vector including a respectiveplurality of feature-correlated values, each feature-correlated valuerepresenting a correlation extent of the respective feature; generatinga reduced-max matrix having a selected plurality of feature-relevantvectors, each row of the reduced-max matrix corresponding to arespective one of the feature-relevant vectors and each column of thereduced-max matrix corresponding to the respective feature; wherein thefeature-relevant vectors are selected by, for each respective feature,identifying a respective multidimensional feature vector in thepoint-feature matrix having a maximum feature-correlated valueassociated with the respective feature; and outputting the reduced-maxmatrix for processing by a final convolution layer of a deep neuralnetwork.
 2. The method of claim 1, wherein the generating comprises:generating an index vector containing row indices of the identifiedmultidimensional feature vectors; generating a sampled index vector bysampling the row indices in the index vector to a desired number; andgenerating the reduced-max matrix using the row indices contained in thesampled index vector.
 3. The method of claim 2, wherein the sampling isdeterministic, further comprising: prior to the sampling, sorting therow indices contained in the index vector in an ascending order.
 4. Themethod of claim 2, wherein the desired number is predefined forperforming batch processing for different respective point clouds. 5.The method of claim 1, wherein at least two identified respectivemultidimensional feature vectors in the point-feature matrix areidentical and correspond to an identical data point, the method furthercomprising: for the at least two identified respective multidimensionalfeature vectors corresponding to the identical data point, thegenerating further comprises: selecting a respective unique row indexassociated with the at least one identified respective multidimensionalfeature vector; generating a unique index vector that includes aplurality of respective unique row indices each corresponding to adifferent respective point; and generating the reduced-max matrix havinga selected plurality of feature-relevant vectors based on the uniqueindex vector, wherein the selected feature-relevant vectors aredifferent with respect to each other.
 6. The method of claim 1, whereinoutputting the reduced-max matrix comprises: providing the reduced-maxmatrix as input to the final convolution layer, the convolution layerperforming feature extraction on the feature-relevant vectors to obtaina desired number of represented features in each feature-relevantvector; and providing the output of the final convolution layer to anobject classification subsystem of the deep neural network to classifythe selected plurality of feature-relevant vectors.
 7. The method ofclaim 1, the receiving comprising: receiving a plurality of theunordered data points of the point cloud; generating a plurality oftransformed data by applying preliminary spatial transformation andfiltering to the received unordered data points; and providing theplurality of transformed data to a convolutional layer of featureextraction subsystem of the deep neural network to generate theplurality of multidimensional feature vectors.
 8. The method of claim 1,wherein the plurality of unordered data points are captured by LIDARsensor or red green blue-depth (RGB-D) camera.
 9. A method implementedin a deep neural network, the method comprising: receiving a pluralityof unordered data points of a point cloud; encoding the plurality ofunordered data points using a convolutional layer of the deep neuralnetwork to generate a plurality of multidimensional feature vectorsarranged in a point-feature matrix, each row of the point-feature matrixcorresponding to a respective one of the multidimensional featurevectors, and each column of the point-feature matrix corresponding to arespective feature, the respective multidimensional feature vectorrepresents a respective unordered data point from a point cloud andincludes a plurality of feature-correlated values each representingcorrelation extent of the respective feature; providing thepoint-feature matrix to a critical point layer (CPL) to: generate areduced-max matrix having a selected plurality of feature-relevantvectors, each row of the reduced-max matrix corresponding to arespective one of the feature-relevant vectors and each column of thereduced-max matrix corresponding to the respective feature; wherein thefeature-relevant vectors are selected by, for each respective feature,identifying a respective multidimensional feature vector in thepoint-feature matrix having a maximum feature-correlated valueassociated with the respective feature; and output the reduced-maxmatrix to a final convolution layer of the deep neural network; andoutputting a plurality of classified points by applying the reduced-maxmatrix to the at least one neural network layer.
 10. The method of claim9, wherein the CPL is applied to: generate an index vector containingrow indices of the identified multidimensional feature vectors; generatea sampled index vector by sampling the row indices in the index vectorto a desired number; and generate the reduced-max matrix using the rowindices contained in the sampled index vector.
 11. The method of claim10, sampling the row indices in the index vector to a desired number isdeterministic sampling, and the CPL is further applied to: prior to thesampling, sort the row indices contained in the index vector in anascending order.
 12. The method of claim 10, wherein the desired numberis predefined for performing batch processing for different respectivepoint clouds.
 13. The method of claim 9, wherein at least two identifiedrespective multidimensional feature vectors in the point-feature matrixare identical and corresponds to an identical data point, and the CPL isfurther applied to: for the at least two identified respectivemultidimensional feature vectors corresponding to the identical datapoint, select a respective unique row index associated with the at leastone identified respective multidimensional feature vector; set a uniqueindex vector that includes a plurality of respective unique row indiceseach corresponding to a different respective data point; and generatethe reduced-max matrix having a selected plurality of feature-relevantvectors based on the unique index vector, wherein the selectedfeature-relevant vectors are different with respect to each other.
 14. Asystem, comprising: a non-transitory memory storage comprisinginstructions; and one or more processors in communication with thememory, wherein the one or more processors execute the instructions to:receive a plurality of multidimensional feature vectors arranged in apoint-feature matrix, each row of the point-feature matrix correspondingto a respective one of the multidimensional feature vectors, and eachcolumn of the point-feature matrix corresponding to a respectivefeature, each multidimensional feature vector representing a respectiveunordered data point from a point cloud and each multidimensionalfeature vector including a respective plurality of feature-correlatedvalues, each feature-correlated value representing a correlation extentof the respective feature; generate a reduced-max matrix having aselected plurality of feature-relevant vectors, each row of thereduced-max matrix corresponding to a respective one of thefeature-relevant vectors and each column of the reduced-max matrixcorresponding to the respective feature; wherein the feature-relevantvectors are selected by, for each respective feature, identify arespective multidimensional feature vector in the point-feature matrixhaving a maximum feature-correlated value associated with the respectivefeature; and output the reduced-max matrix for processing by a finalconvolution layer of a deep neural network.
 15. The system of claim 14,wherein the one or more processors further execute the instructions to:generate an index vector containing row indices of the identifiedmultidimensional feature vectors; generate a sampled index vector bysampling the row indices in the index vector to a desired number; andgenerate the reduced-max matrix using the row indices contained in thesampled index vector.
 16. The system of claim 15, wherein the samplingis deterministic, and the one or more processors further execute theinstructions to: prior to the sampling, sort the row indices containedin the index vector in an ascending order.
 17. The system of claim 15,wherein the desired number is predefined for performing batch processingfor different respective point clouds.
 18. The system of claim 14,wherein at least two identified respective multidimensional featurevectors in the point-feature matrix are identical and correspond to anidentical data point, and the one or more processors further execute theinstructions to: for the at least two identified respectivemultidimensional feature vectors corresponding to the identical datapoint, select a respective unique row index associated with the at leastone identified respective multidimensional feature vector; generate aunique index vector that includes a plurality of respective unique rowindices each corresponding to a different respective data point; andgenerate the reduced-max matrix having a selected plurality offeature-relevant vectors based on the unique index vector, wherein theselected feature-relevant vectors are different with respect to eachother.
 19. The system of claim 14, wherein the one or more processorsfurther execute the instructions to output the reduced-max matrix by:providing the reduced-max matrix as input to the final convolution layerof the deep neural network, the final convolution layer performingfeature extraction on the feature-relevant vectors to obtain a desirednumber of represented features in each feature-relevant vector; andprovide the output of the final convolution layer to an objectclassification subsystem of the deep neural network to classify theselected plurality of feature-relevant vectors.
 20. The system of claim14, wherein the one or more processors further execute the instructionsto: receive a plurality of the unordered data points of the point cloud;generate a plurality of transformed data by applying preliminary spatialtransformation and filtering to the received unordered data points; andencode the plurality of transformed data to a convolutional layer offeature extraction subsystem of the deep neural network to generate theplurality of multidimensional feature vectors.